Minimal Test Cost Feature Selection with Positive Region Constraint

نویسندگان

  • Jiabin Liu
  • Fan Min
  • Shujiao Liao
  • William Zhu
چکیده

Test cost is often required to obtain feature values of an object. When this issue is involved, people are often interested in schemes minimizing it. In many data mining applications, due to economic, technological and legal reasons, it is neither possible nor necessary to obtain a classifier with 100% accuracy. There may be an industrial standard to indicate the accuracy of the classification. In this paper, we consider such a situation and propose a new constraint satisfaction problem to address it. The constraint is expressed by the positive region; whereas the objective is to minimize the total test cost. The new problem is essentially a dual of the test cost constraint attribute reduction problem, which has been addressed recently. We propose a heuristic algorithm based on the information gain, the test cost, and a user specified parameter λ to deal with the new problem. Experimental results indicate the rational setting of λ is different among datasets, and the algorithm is especially stable when the test cost is subject to the Pareto distribution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection with Positive Region Constraint for Test-Cost-Sensitive Data

In many data mining and machine learning applications, data are not free, and there is a test cost for each data item. Due to economic, technological and legal reasons, it is neither possible nor necessary to obtain a classifier with 100% accuracy. In this paper, we consider such a situation and propose a new constraint satisfaction problem to address it. With this in mind, one has to minimize ...

متن کامل

Feature selection with test cost constraint

Feature selection is an important preprocessing step in machine learning and data mining. In real-world applications, costs, including money, time and other resources, are required to acquire the features. In some cases, there is a test cost constraint due to limited resources. We shall deliberately select an informative and cheap feature subset for classification. This paper proposes the featu...

متن کامل

A Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection

Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that ...

متن کامل

Minimal cost feature selection of data with normal distribution measurement errors

Minimal cost feature selection is devoted to obtain a trade-off between test costs and misclassification costs. This issue has been addressed recently on nominal data. In this paper, we consider numerical data with measurement errors and study minimal cost feature selection in this model. First, we build a data model with normal distribution measurement errors. Second, the neighborhood of each ...

متن کامل

Feature Selection in Structural Health Monitoring Big Data Using a Meta-Heuristic Optimization Algorithm

This paper focuses on the processing of structural health monitoring (SHM) big data. Extracted features of a  structure are reduced using an optimization algorithm to find a minimal subset of salient features by removing noisy, irrelevant and redundant data. The PSO-Harmony algorithm is introduced for feature selection to enhance the capability of the proposed method for processing the  measure...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012